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^ (54) Title: METHODS OF PRODUCING NOVEL ENZYMES 

^ (57) Abstract: Methods of obtaining enzymes that bind target substrate and catalyse desired reactions. a/|5-barrel proteins are 
categorised into two classes based on catalytic lid structure. Lids can be grafted onto scaffolds with additional minor modifications 
at conserved and non -conserved residues to provide candidate product enzymes for screening for the desired properties. Design of 
a novel enzyme which binds a target substrate and catalyses a reaction of choice is facilitated by selection of a scaffold which binds 
the substrate and of a catalytic lid of the correct class for the desired reaction. Targeted or focussed mutagenesis may be used to 
refine substrate binding and catalysis. 
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METHODS OF PRODUCING NOVEL ENZYMES 

The present invention relates to protein design, 
specifically design of enzymes. It is based on work of the 
inventors in categorising a/p-barrel proteins into two 
classes based on catalytic lid structure, and recognising 
that enzymes which catalyse a given class of reactions are 
found in one or other of the two classes. Design of a novel 
enzyme which binds a target substrate and catalyses a 
reaction of choice is facilitated by selection of a scaffold 
which binds the substrate and of a catalytic lid of the 
correct class for the desired reaction. Targeted or 
focussed mutagenesis may be used to refine substrate binding 
and catalysis. 

Enzymes are Nature ! s catalysts. They are proteins that have 
evolved to bind specific substrates and catalyse specific 
reactions at optimal efficiency and yield under conditions 
in the cell. However, using protein engineering only a few 
highly active new enzymes have been produced, and no general 
methodology achieved. Such catalysts as have been made have 
employed specific features unique to individual proteins 
(Structure and Mechanism in Protein Science: A Guide to 
Enzyme Catalysis and Protein Folding. A. Fersht (WH Freeman 
and Co, 1999), chapters 15 and 16). The field of catalytic 
antibodies in which the naturally binding proteins have been 
evolved to become catalysts has also failed in general to 
produce highly active molecules that rival natural enzymes 
(Fersht, supra, pp 60, 361). 

\ 

Natural evolution involves mutation and selection. Random 
mutation and selection in vitro is, without simplifying 
rules, too difficult and time consuming because a large 
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number of mutations have generally to be made to evolve a 
new catalytic activity. The present inventors have 
appreciated that Nature has evolved design principles to 
diversify a/p-b^rrel protein activity more rapidly, and here 
5 provide rules for novel enzyme design that greatly reduce 
the number and choice of residues which to mutate. 

The a/p-barrel 

Proteins adopt many different topologies of folded 
10 structure. However, one particular fold, the a/p-barrel, is 
the most common, accounting for some 10% of known enzymes. 
The a/p-barrel is clearly an important target as the 
framework for novel protein design, but despite considerable 
efforts no one has deciphered and demonstrated 
15 experimentally how Nature is able to use this design of fold 
so effectively. 

It has been speculated previously that the binding sites in 
a/p-barrel enzymes may have evolved by divergent evolution, 
20 so acquiring the ability to bind other substrates (cited in 
Fersht, supra) . Specifically, an archetypal enzyme that 
catalyses a particular reaction on a particular substrate 
may evolve into a family of enzymes catalysing the same 
reaction, but on a variety of substrates. 

25 

The inventors have analysed a particular structure of a/p- 
barrel enzymes, called the "active-site lid", febat is 
involved primarily with catalysis rather than specificity of 
binding (see below) . The lid contains amino acid residues 
30 whose function is providing catalytic chemical groups in the 
active site. The lids are herein divided into two main 
classes. The inventors have identified a correlation 
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between the class of the lid and the kind of mechanism 
catalysed by the enzyme. From this, the present invention 
provides for grafting a template lid onto a selected barrel 
framework, or modifying an underlying framework to provide 
5 an altered lid (e.g. a lid of the alternative class), and 
then subjecting the lid to targeted mutagenesis and 
selection, to create new enzymes catalysing a desired 
reaction . 



10 GLOSSARY 
Helix 

A helix is formed by a polypeptide chain with repeating phi 
and psi angles. Its geometry is defined by the number of 
residues per turn, and the rise per residue. In principle 
15 the polypeptide chain can form right and left handed helices 
with a range of pitches {see Fersht, supra, and Introduction 
to Protein Structure, 2nd. Edition Branden, C, and Tooze, 
J. (Garland Publishing Inc., New York, 1999)). 



2 0 Loop 

A protein loop is any stretch of nonregular polypeptide 
chain connecting secondary structures. Short loop regions 
adopt a restricted set of conformations and loop families 
have been recognised in specific supersecondary structures. 

25 

Beta Sheet (fi sheet) 

These structures are formed from residues in an extended 
conformation with psi phi bond angle pairs irf^the wide 
allowed region in the upper left hand corner of the 
30 Ramachandran plot. The strands of the beta shept are not 

fully extended, due to the constraints of hydrogen bonding, 
and the sheets appear pleated. In addition there is a left- 
handed twist between adjacent strands when looking at right 
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angles to the strand direction (Chothia, 1973, J . Mol. Bill. 
75: 295-302) . The beta strands in a sheet can be arranged to 
form parallel, antiparallel or mixed sheets. Refer to 
Richardson, (1977) Nature 268: 495-500* 

Beta Strand (fi Strand) 

A beta strand describes a single length of polypeptide chain 
that forms part of a beta sheet. 

Parallel Beta Sheets 

This is a beta-pleated sheet in which successive beta 
strands all lie parallel in three dimensions. "Such sheets 
have evenly spaced hydrogen bond pairs that lie at an angle 
to the beta strands 

Beta-Alpha-Beta Units: 

Beta-alpha-beta units consist of two parallel hydrogen 
bonded beta strands connected by a loop containing at least 
one alpha helix. 

* 

Beta Barrel 

In some instances large anti-parallel (or parallel) sheets 
can roll up completely to join edges and form a cylinder or 
closed 'barrel', in which the first strand is hydrogen 
bonded to the last. 

Topology (fold family), 

ft*. 

Structures are grouped into fold families at trtfls level 
depending on both the overall shape and connectivity of the 
secondary structures. This is done using the structure 
comparison algorithm SSAP (Taylor and Orengo (1989) J. Mol. 
Biol. 208: 1-22 and (1989) Protein Eng. 2: 505-519. 
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Parameters for clustering domains into the same fold family 
have been determined by empirical trials throughout the 
Brookhaven databank. Structures which have a SSAP score of 
70 and where at least 60% of the larger protein matches the 
5 smaller protein are assigned to the same T level or fold 
family. 

Topology cartoons 

Protein topology cartoons are simplified representations of 
10 protein folds. These diagrams are two-dimensional schematic 
representations of protein structures. They represent the 
structure as a sequence of secondary structure elements 
(helices and strands), and illustrate the relative spatial 
position and direction of these elements. 

15 

Homologous Superfamily 

This level groups together protein domains which are thought 
to share a common ancestor and can therefore be described as 
homologous. Similarities are identified first by sequence 
20 comparisons and subsequently by structure comparison using 
SSAP. Structures are clustered into the same homologous 
superfamily if they satisfy one of the following criteria. 

(i) Sequence identity * 35%, 60% of larger structure 
25 equivalent to smaller 

(ii) SSAP score * 80.0 and sequence identity 2> 20% 

(ii) 60% of larger structure equivalent to smaller 

(iii) SSAP score >= 80.0, 60% of larger structure equivalent 
to smaller, and 

30 (iv) domains have related functions i 

Sequence Families 

Structures within each homologous superfamily are further 
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clustered on sequence identity, using CATH (see below) . 
Domains clustered in the same sequence families have 
sequence identities >35% (with at least 60% of the larger 
domain equivalent to the smaller) , indicating highly similar 
structures and functions. (Thornton et al., J. Mol. Biol. 
293, 333-342. (1999); Taylor and Orengo J.Mol. Biol. 208. 1- 
22. (1989a); Taylor and Orengo Protein Eng. 2. p. 505-519. 
(1989b) ) . 

Active site lid in a/fi-barrel proteins 

This is the structure that covers the active site, closing 
and shielding it from solvent. 

a/p-barrel proteins are identified in the CATH and SCOP 
databases (CATH - A Hierarchic Classification of Protein 
Domain Structures, Orengo et al . Structure. 5, 1093-1108 
(1997) http://www.biochem.ucl.ac.uk/bsm/cath/ ; SCOP - 
Murzin et al., J. Mol. Biol 247:536-540 (1995) and see also 
http://scop.mrc-lmb.cam.ac.uk/scop), and in the dedicated 
database for such proteins TIM-DB at 

http://argo.urv.es/-pujadas/TIM/ and Pujadas & Palau 
Biologia, Bratislava, 54 (3): 231-254, (1999)).. 

A list of a/p-barrel proteins to which aspects of the 
present invention can be applied, or which can be employed 
in the present invention, appears below as Table IV. Each 
of these has a scaffold including a binding site for a 
substrate or ligand, and an active site lid. %ti accordance 
with the present invention the scaffold or binding site of 
any of these may be employed either to bind a substrate of 
choice or as a starting point for mutagenesis and selection 
for ability to bind the chosen substrate. Likewise, the 
active site lid of any of these may be grafted onto a chosen 
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scaffold and employed either to catalyse the desired 
reaction on the chosen substrate or as a starting point for 
mutagenesis and selection for ability to catalyse the 
desired reaction. As explained, an active site lid for a 
5 desired reaction or type of reaction may be chosen at least 
partly on the basis of its classification as a Class I or 
Class II a/p-barrel as defined herein. 

Table III shows an overview of different reaction mechanisms 
10 for which a/p-barrel enzymes have been found to be active. 
In selection of a particular architecture for the active 
site in accordance with the present invention, the kind of 
reaction mechanism involved (e.g. proton abstraction, 
protein abstraction after enolisation, proton abstraction 
15 from Schiff base intermediates, metal activated hydrolysis, 
attack of amino-acid side-chain nucleophiles to 
specifically activated atoms in the substrate, and so on) 
may be taken into account. Thus, where a reaction of a 
particular type is desired, an active site lid of the 
20 appropriate class may be selected, preferably an active site 
lid which catalyses the desired reaction or a similar 
reaction (albeit with a different substrate) . 

All documentation cited herein is incorporated by reference, 
25 including internet sites and databases (especially in the 
form available at the date of filing of the present 
specification, but where possible including the latest 
updates) . ^ 
Brief Description of the Figures 

Figure 1 shows schematic representation and structural 
features of the two classes of a/p barrel proteins, 
illustrated with reference to PRAI (Class I) and IGPS (Class 
II) . The eight p-strands of the barrel are indicated by 
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triangles. Alpha helices are indicated by rectangles and 
the constant regions, phosphate binding (p7a7 and P8ct8) and 
the anthranilate binding site (p2a2), by dark loops. For 
Class I (PRAI group) structures r the main feature of the 
5 active site lid (P6a6) is represented by the loop in white 
with a shadow. The structure is a view from the top of the 
barrel which constitutes the active site of PRAI . The lotus 
leaf lid p6a6 is indicated by a white ribbon. The pial loop 
is the shorter of the two white ribbons. The constant 

10 regions (phosphate binding site and anthranilate binding" 

site) are shaded. The clover leaf , (shadow) lid of the Class 
II structure is also shown, which has three principal 
elements: the extra N-terminal; loop plal; and loop P6a6 (all 
dark). The other structural features are indicated as above. 

15 The structure is a top view of the Class II (IGPS group) 

barrel. The IGPS scaffold, extra N-terminal residues, and 
the plal and p6a6 loop are indicated by dark ribbons. The 
constant regions are shaded. 

20 Figure 2 illustrates the reactions catalysed by 
phosphoribosyl anthranilate isomerase (PRAI) and 
indoleglycerol-phosphate synthase (IGPS) . The PRAI reaction 
is an intramolecular redox reaction (Amadori rearrangement) 
of N-5-phosphoribosyl) anthranilate (PRA) to (l-(2- 

25 carboxyphenylamino) -1-deoxyribulose 5-phosphate (CdRP). In 
the IGPS reaction, the substrate CdRP undergoes an 
irreversible ring-closure to indoleglycerol phosphate (IGP) 
with release of C0 2 and H 2 0. Chemical reduction of CdRP by 
borohydride produces the substrate analogue rCdRP for IGPS. 

30 The rCdRP is an inhibitor of both enzymes. \ 
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Figure 3 shows a sequence alignment of in vitro evolved PRAI 
(ivePRAI), PRAI and IGPS. The single-letter code for amino 
acid residues is used. Residues in IGPS (71-254) Identities 
167/184 (90%),: similarities: 171/184 (92%). Residues in PRAI 
5 (375-396) 8/18 (44%); similarities 12/18 (66%), Identities: 
outline, bold and shade; Similarities: outline and shade 

Figure 4 shows a protein topology ("TOPS") cartoon for a 
protein (triangular symbols represent beta strands and the 
10 circular ones helices) . 

Figure 5 shows a protein topology ("TOPS") cartoon for 
another protein (triangular symbols represent beta strands 
and the circular ones helices) . 

15 

Figure 6 illustrates topology of a protein with reference to 
its sequence. 

♦ 

Sequence identities are calculated using the program Blast, 
20 using the following parameters: H=0, V=-20, B= 20, S=40, - 
ctxfactor=1.00, E=64.8038 (Altschul et al., (1990) J. Mol. 
Biol. 215: 403-410) . 

Aspects and embodiments of the present invention are 
25 disclosed throughout this text, and generally provide 

methods of obtaining novel enzymes, or in particular methods 
of obtaining an enzyme that catalyses a desired reaction on 
a target substrate. The invention also proy^tes a method of 
classifying a/p-barrel proteins into two classes by means of 
30 applying criteria disclosed herein, and a method whereby an 
a/p-barrel protein is appointed as a member of Class I or 
Class II in accordance with these criteria. Following 
classification, a method according to the invention may 
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generally provide for alteration of the active site lid of 
an a/p-barrel protein of Class I to convert it into Class 
II, or may generally provide for alteration of the active 
site lid of an an ct/p-barrel protein of Class II to convert 
it into Class I. Moreover, the present invention provides 
for modification of an a/p-barrel protein which catalyses a 
first reaction of a given reaction type into an a/p-barrel 
protein which catalyses a second reaction of that reaction 
type, and also provides for modification of an a/p-barrel 
protein which catalyses a first reaction of a given reaction 
type into an a/p-barrel protein which catalyses a second 
reaction of a different reaction type. By means of one or 
more of such methods, an enzyme which catalyses a desired 
reaction on a target substrate may be obtained, and this may 
involve conversion of an enzyme from one of Class I and 
Class II to the other (especially where a protein is 
modified to catalyse a reaction of a different type), or may 
involve maintenance of a structure conforming to Class I or 
to Class II, while altering substrate binding specificity 
and/or reaction catalysed. 

A method of obtaining an enzyme in accordance with the 
present invention may involve modifying one or more, or 
preferably a combination of the following regions: the N- 
terminal segment, the pi-al loop, and the p6-a6 loop, 
especially where an enzyme of one of Class I and Class II is 
converted into the other Class. In preferred embodiments, 
one or more of the following may additionally ffie mutated: 
extra domains between P3a3 and C-terminal segment (after P8) . 

\ 

As is discussed in detail elsewhere herein, a scaffold may 
be chosen (for engineering of a desired active site lid) 
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from any a/p-barrel protein, but is preferably chosen to be 
one which binds the target substrate of interest. Where 
such a scaffold is not available, a second preference is for 
a a/p-barrel protein which binds a similar substrate, i.e. a 
molecule with as much structural similarity as possible. 
Mutation of the scaffold may then be used to alter its 
binding specificity so it binds the target substrate. The 
regions which may be mutated in order to alter substrate 
binding specificity are discussed elsewhere herein. 



A method of obtaining an enzyme in accordance with the 
present invention may be used to provide a protein which 
comprises an a/p-barrel scaffold which binds a target 
substrate and a catalytic lid which catalyses a desired 

15 reaction. The scaffold may be provided from a a/p-barrel 
which naturally binds said target substrate, or may be 
provided by a method comprising mutation of a a/p-barrel and 
selection for binding to said target substrate. Such 
enzymes are provided as further aspects of the present 

20 invention, as is their use in a method of catalysing the 

desired reaction on the target substrate, along with other 
aspects and embodiments disclosed herein. 

A protein or polypeptide according to the present invention 
25 may be considered "chimaeric", in embodiments where the 

scaffold is of one protein and the active site lid is of 
another protein. The resultant chimaera may represent a 
"humanised" enzyme, wherein a human enzyme is modified to 
introduce an enzymatic activity of a non-human, e.g. other 
30 mammalian or microbial, enzyme. The present invention 

allows for minimal, minor modification to a parent scaffold 
(e.g. human) to introduce the desired enzymatic acitivity, 
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minimising effects on immunogenicity in a human of the 
product enzyme. Usually, in addition to grafting of an 
active site lid onto a scaffold, or engineering a protein 
with a particular scaffold to alter its active site lid, 
5 some further mutation may be required to obtain the desired 
catalysis on the target substrate or may be desirable to 
increase affinity for substrate and/or rate of catalysis. 
Appropriate regions of proteins for such targeted mutation 
are discussed in detail elswhere herein, and include 
10 catalytic residues, pi-al loop and/or p2-a2 loop (for Class 
I), metal binding site, N-terminal extension and/or C- 
terminal extension (for Class II) . 

■ 

A suitable selection system may be employed to identify 
15 mutations with the desired effect. For instance, phage 

display may be used to identify members of a population of 
mutated proteins which bind a target subsrate. Selection 
systems, including in vivo selection systems, for catalysis 
of the desired reaction may be available or can be designed, 
20 as exemplified experimentally below. 

A convenient way of producing a polypeptide according to the 
present invention is to express nucleic acid encoding it, by 
use of nucleic acid in an expression system. 

25 Accordingly the present invention also provides in various 
aspects .nucleic acid encoding the polypeptides of the 
invention, which may be used for production of the encoded 
polypeptide. 

30 Generally when encoding for a polypeptide in accordance with 
the present invention, nucleic acid is provided as an 
isolate, in isolated and/or purified form, or free or 
substantially free of material with which it is naturally 
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associated, such as free or substantially free of nucleic 
acid flanking the gene in the human genome, except possibly 
one or more regulatory sequence (s) for expression. Nucleic 
acid may be wholly or partially synthetic and may include 
5 genomic DNA, cDNA or RNA. Where nucleic acid according to 
the invention includes RNA, reference to the sequence shown 
should be construed as encompassing reference to the RNA 
equivalent, with U substituted for T. 

10 Nucleic acid sequences encoding a polypeptide in accordance 
with the present invention can be readily prepared by the 
skilled person using the information and re-ferences 
contained herein and techniques known in the art (for 
example, see Sambrook, Fritsch and Maniatis, "Molecular 

15 Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory 
Press (1989), and Ausubel et al. f Current Protocols in 
Molecular Biology, John Wiley and Sons, (1994)), given the 
nucleic acid sequence and clones available. These 
techniques include (i) the use of the polymerase chain 

20 reaction (PCR) to amplify samples of such nucleic acid, e.g. 
from genomic sources, (ii) chemical synthesis, or (iii) 
preparing cDNA sequences. DNA encoding a polypeptide may be 
generated and used in any suitable way known to those of 
skill in the art, including by taking encoding DNA, 

25 identifying suitable restriction enzyme recognition sites 

either side of the portion to be expressed, and cutting out 
said portion from the DNA. The portion may then be operably 
linked to a suitable promoter in a standard Commercially 
available expression system. Another recombinant approach 

30 is to amplify the relevant portion of the DNA^with suitable 
PCR primers. Modifications to the relevant sequence may be 
made, e.g. using site directed mutagenesis, to lead to the 
expression of modified polypeptide or to take account of 
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codon preference in the host cells used to express the 
nucleic acid. 

In order to obtain expression of the nucleic acid sequences, 
the sequences may be incorporated in a vector having one or 
more control sequences operably linked to the nucleic acid 
to control its expression. The vectors may include other 
sequences such as promoters or enhancers to drive the 
expression of the inserted nucleic acid, nucleic acid 
sequences so that the polypeptide is produced as a fusion 
and/or nucleic acid encoding secretion signals so that the 
polypeptide produced in the host cell is secreted from the 
cell. Polypeptide can then be obtained by transforming the 
vectors into host cells in which the vector is functional, 
culturing the host cells so that the polypeptide is produced 
and recovering the polypeptide from the host cells or the 
surrounding medium. Prokaryotic and eukaryotic cells are 
used for this purpose in the art, including strains of E. 
coli, yeast, and eukaryotic cells such as COS or CHO cells. 

Thus, the present invention also encompasses a method of 
making a polypeptide (as disclosed) , the method including 
expression from nucleic acid encoding the polypeptide 
(generally nucleic acid according to the invention) . This 
may conveniently be achieved by growing a host cell in 
culture, containing such a vector, under appropriate 
conditions which cause or allow expression of the 
polypeptide. Polypeptides may also be expressed in in vitro 
systems, such as reticulocyte lysate. 

\. 

Systems for cloning and expression of a polypeptide in a 
variety of different host cells are well known. Suitable 
host cells include bacteria, eukaryotic cells such as 
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mammalian and yeast, and baculovirus systems. Mammalian 
cell lines available in the art for expression of a 
heterologous polypeptide include Chinese hamster ovary 
cells, HeLa cells, baby hamster kidney cells, COS cells and 
5 many others. A common, preferred bacterial host is E. coli. 

Suitable vectors can be chosen or constructed, containing 
appropriate regulatory sequences, including promoter 
sequences, terminator fragments, polyadenylation sequences, 

10 enhancer sequences, marker genes and other sequences as 

appropriate. Vectors may be plasmids, viral e.g. 'phage, or 
phagemid, as appropriate. For further details see, for 
example, Molecular Cloning: a Laboratory Manual: 2nd 
edition, Sambrook et ai., 1989, Cold Spring Harbor 

15 Laboratory Press. Many known techniques and protocols for 
manipulation of nucleic acid, for example in preparation of 
nucleic acid constructs, mutagenesis, sequencing, 
introduction of DNA into cells and gene expression, and 
analysis of proteins, are described in detail in Current 

20 Protocols in Molecular Biology, Ausubel et al. eds., John 
Wiley & Sons, 1992. 

Thus, a further aspect of the present invention provides a 
host cell containing encoding nucleic acid as- disclosed 
25 herein. 

The nucleic acid of the invention may be integrated into the 
genome (e.g. chromosome) of the host cell. ^Integration may 
be promoted by inclusion of sequences which promote 
30 recombination with the genome, in accordance with standard 

techniques. The nucleic acid may be on an extra-chromosomal 
vector within the cell, or otherwise identifiably 
heterologous or foreign to the cell. 
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A still further aspect provides a method which includes 
introducing the nucleic acid into a host cell. The 
introduction, which may (particularly for in vitro 
introduction) be generally referred to without limitation as 
"transformation", may employ any available technique. For 
eukaryotic cells, suitable techniques may include calcium 
phosphate t ransfection, DEAE-Dextran, electroporation, 
liposome-mediated transfection and transduction using 
retrovirus or other virus, e.g. vaccinia or, for insect 
cells, baculovirus. For bacterial cells, suitable 
techniques may include calcium chloride transformation, 
electroporation and transfection using bacteriophage. As an 
alternative, direct injection of the nucleic acid could be 
employed. 

Marker genes such as antibiotic resistance or sensitivity 
genes may be used in identifying clones containing nucleic 
acid of interest, as is well known in the art. 

The introduction may be followed by causing or allowing 
expression from the nucleic acid, e.g. by culturing host 
cells (which may include cells actually transformed although 
more likely the cells will be descendants of the transformed 
cells) under conditions for expression of the gene, so that 
the encoded polypeptide is produced. If the polypeptide is 
expressed coupled to an appropriate signal leader peptide it 



Following production by expression, a polypeptide may be 
isolated and/or purified from the host cell and/^or culture 
medium, as the case may be, and subsequently used as 
desired, e.g. in the formulation of a composition which may 
include one or more additional components, such as a 



may be secreted from the cell into 
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pharmaceutical composition which includes one or more 
pharmaceutical^ acceptable excipients, vehicles or 
carriers . 

Further aspects and embodiments of the present invention 

i 

5 will be apparent to those skilled in the art, in view of the 
present disclosure. To facilitate understanding of various 
aspects of the invention the following explanation of the 
inventors' work and how it is applicable in the present 
invention is provided, supplementing the experimental work 
10 described in further detail later. 

Classification of a/fi-Barrels 

The basic a/p-barrel framework consists of at least 200 
residues arranged in eight parallel p-strands connected and 
15 surrounded by eight helices, with a central hydrophobic 

core. Anyone familiar with protein structure can identify 
the strands and helices by inspection of molecular models or 
by use of computer programs such as Rasmol 

(http : //www . mrc . cpe . cam . ac . uk/cpe/manuals /ccp4 /rasmol . html ) , 
20 Molscript (Kraulis et al . Biochemistry, 1994, 33: 3515- 

3531), CATH or SCOP The barrel structure can sometimes be 
circularly permuted by connecting the N and C-termini and 
cutting elsewhere by changing the DNA that codes for the 
protein. However, someone skilled in the art will know 
25 where the original N-terminus would have been. The 

numbering herein of the sequence of strands and helices is 
based on the conventional position of the N-terminus. The 
strands in the barrel are numbered sequentially pi to P8 and 
the helices al to a8 from the N-terminus. These are arranged 
30 such that strand p8 is adjacent to and hydrogen-bonded with 
strand pi. In a few cases, the barrels do not have eight 
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parallel p strands. There are barrels that contain ten 
parallel P strands. 

The active site ,is always in the same region of the protein, 
5 at the C-terminus, and is formed by residues of the eight 
loops connecting the carboxy end of each strand with the 
amino end of the following helix. 

The a/p-barrel enzymes have two sets of loops. The O 
L0 ' terminal end contains a p-loop-ctunit, which presents wide 
variation in their structure and length. The loops in the 
a-loop-punits within the barrel, are shorter and they can 
adopt two different conformations for strand entry into the 
parallel p sheet. Branden, C, supra. Chothia, C. & Lesk, 
L5 A. M. Conformations for strand entry into parallel p sheets 
pp49-58 (1991), In Molecular Conformation and Biological 
Interactions. Ed. Balaram P and Ramaseshan, S. Indian 
Academy of Sciences, Bangalore. 



20 In the scaffold there are mainly three pieces that can be 

combined and exchanged: the lid of the active site (variable 

region) ; the hydrophobic area and the charged area in the 
binding site (constant region) see below. 

25 As noted, the active site lid is the structure that covers 
the active site, closing and shielding it from solvent. It 
may consist of or comprise loops at the carboxyl termini of 
the of the p-strands (e.g. pial, P6a6) , extra N-terminal 
segment, extra domains (between P3a3) and/or C-terminal 

30 segment (after P8) . More of 70 % of catalytic residues in 
the a/p-barrel enzymes appear in these structural motif. 
These residues are directed involved in the rate-limiting 
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step in the reaction mechanism. The rest of the catalytic 
residues are located in the loops at the carboxyl termini 
that form the binding site. They are involved in specific 
substrate binding and catalysis, but their main role is 
interaction with the substrate (holding it in the correct 
position), and they do not participate in the rate-limiting 
step in the reaction mechanism. 

The binding site is the structure (mainly loops) at the 
carboxyl termini of the p-strands that form the funnel- 
shaped pocket and contain 90% of the residues that 
participate in binding (holding the substrate in the correct 
position for the catalysis) and 30 % of residues that 
participate in binding and catalysis in the overall reaction 
but not in the rate-limiting step reaction mechanism. 
The binding site can be divided in two areas, on the basis 
of the chemical nature of amino acid side-chains which form 
it. There is a hydrophobic area and a charged area. The 
residues in the hydrophobic area are more than 60% 
hydrophobic residues ( e.g. leucine, isoleucine, alanine, 
valine, phenylalanine) . The residues in the charged area are 
more than 60% positive, negative or polar amino acid 
residues (e.g. aspartic, glutamic (-) , lysine, arginine (+) , 
asparagine, glutamine, cysteine, histidine, tryptophan) . 
Fersht, supra . Branden, C, supra. 

Since the localisation of the hydrophobic and the charged 
binding sites remain constant, they may be considered as 
"constant" pieces. Mutation of the constant pieces may be 
used to change substrate binding. Among the constant 
features, there is a variable region, the "covering lid" 
placed over the site, which closes and shields it from the 
solvent . 
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The "constant" pieces: 

Phosphate or charged binding site . 
5 The constant region: e.g. p7-a7, p8-a8 segments are part of 
the phosphate-binding site in at least 10 different a/p- 
barrels. Farber & Petsko TIBS 15, 228-234 (1990). Reardon & 
Farbcr FASEB J. 9, 497-503 (1995). Wilmanns et al. 
Biochemistry 30, 9161-9169 (1991); Branden, C, supra. 

10 Small modifications in these "constant" regions cause 
different orientations of the phosphate group of the 
substrate which may lead to changes in substrate affinity, 
e.g. those with PRAI and IGPS Wilmanns, M. , Priestle, J. P., 
Niermann, T. & Jansonius, J. N. Three-dimensional structure 

15 of the bi functional enzyme phosphoribosylanthranilate 

isomerase: indoleglycerolphosphate synthase from Escherichia 
coli refined at 2.0 A resolution. Journal Of Molecular 
Biology 223, 477-507 (1992) . 

20 Hydrophobic pocket . 

The p2-a2 and p4-a4 are part of the hydrophobic pocket in the 
active site. For glycolate oxidase and f lavocytochrome b 2 , a 
few mutations in the active site have been fine-tuned to 
make them effective on different substrates. Branden, C, 

25 supra. 

"Variable region" Active s ite lid. 

t> 

Extra N-terminal segment. The N-terminal structural segment 
30 that is not part of the a/p barrel and leads inttp strand pi 
Branden, C, and Tooze, J. supra 
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fil-al loop. The structure at the carboxyl termini of the p- 
strand number 1 that leads in the a-helix 1. Branden, C, 
and Tooze, J. supra . 

fi6~a6 loop. The structure at the carboxyl termini of the p- 
strand number 6 that lead in the a-helix 6. Branden, C, 
and Tooze, J. supra 

Metal binding site. In some superf amilies (e.g. metal- 
dependent hydrolases) the structural segments P5~a5 and p7- 
a7, together with the C-terminus, are part of the metal- 
binding site. Branden, C, and Tooze, J. supra 

Loops forming others domains. An additional loop region from 
a second domain or a different subunit may comes close to 
the active site and participate in binding and catalysis, as 
is found for pyruvate kinase and amylase in which the loop 
P3-a3 is folded in a separate domain. Branden, C. f and 
Tooze, J. supra 

C-Terminus segment. The segment from the C-end of barrel 
which presents wide variation in its structure and length 
(Table I and II) . It is considered as starting from the C- 
end of P8. In some enzymes, the C-end is part of the lid. 
Branden, C, and Tooze, J. supra 

The classification devised by the present inventors is based 
on the structures of phosphoribosylanthranilate isomerase 
(PRAI) and indole-3-glycerol-phosphate synthase (IGPS) as 
models (Table I and Table II) . \ 
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The main structural feature of the active site lid in the 
Class I (or PRAI group) of ct/p-barrel proteins is mainly the 
connection p6-a6 (10-12 residues), which °is rich in glycine 
residues. For example, PRAI, triosephosphate isomerase, 
class II aldolases and pyruvate kinase, which belong to this 
first class, contains the highly conserved sequences GXGGXG, 
GXG or GXXG. The lack of side chains in the loop p6-a6 is 
sterically favourable to its approaching to the remainder of 
the structure and thus covering the active site. We call 
this Class I or "lotus leaf" lid (Table I and Figure 1). The 
class I group is characterised by the absence of an N- 
terminal extension, or its replacement by a very short 
segment (2-9 amino-acid residues), generally accompanied by 
a characteristically short pl-al connection segment (2-11 
residues) . 

The IGPS domain belongs to Class II (Table II and Figure 1) . 

Its lid is shaped as a clover leaf and encompasses three 
main substructures. The first two structural segments 
present wide variations in their structure and length. 
These are an extra N-terminal segment, and pi-al structural 
segment. The number of residues in both components together 
varies from 18-89 residues (Table I and II) . The segment 
connecting P6 to a6 (10-12 amino acid residues) does not 
contain any particularly conserved sequence among different 
superfamilies. It is positioned to interact with the N- 
terminal segment when the lid is closed over t^he binding 
site . 



Correlation between the structural class of the\ lid and the 
reaction mechanism. 
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The structure of the active site lid relates to the 
mechanism (Table III), For example, triosephosphate 
isomerase and xylose isomerase both catalyse aldose-ketose 
isomerisations of different substrates. The first enzyme 
5 belongs to class I and uses a proton-transfer mechanism. 

The second one (Class II) has a hydride transfer mechanism. 

In an enzyme family that catalyses the same reaction by the 
same mechanism but for different substrates, the 
classification of the lid remains the same, but the lids 

10 vary in length and sequence to generate the different 
specificities (Table III). For example, aldol-ketol 
isomerisations in TIM-like aldol-ketol isomerases are 
mechanistically related to 2-hydroxyaldimine-ketoamine 
isomerisations (a reaction known as Amadori rearrangement) 

15 in PRAI. In both cases, general-base catalysed proton 

abstraction and repositioning occur, although the reaction 
intermediates are different. Both enzymes belong to class I 
(Table I and III) . The metal-dependent hydrolase superfamily 
is another example of this. This family uses a dozen 

20 different substrates and is responsible for seven of some 20 
steps along four important metabolic pathways. They have a 
common reaction mechanism; the metal ion (or ions) activates 
a water molecule for nucleophilic attack to the substrate. 
They are all in our Class II (Tables II and III) . 

25 

Variations on the C- terminal regions of the barrel and the 
active loop regions. Changes in residue spacing plays a 
major role in evolution of protein function, pith insertions 
and deletions contributing substantially to the 
30 diversification of enzyme activities. At one ^evel in the 
a/p-barrel family, such changes can lead to changes in 
specificity although retaining membership of class I or II. 
An interesting example is the enolase superfamily (Class 
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II). During evolution they have retained the structural 
strategy of catalysing the chemically difficult step of a- 
proton abstraction but they gained additional functional 
groups to catalyse different overall reactions. Further, 
more radical changes can lead to the change of lid design, 
accompanied by a change in class and a change in mechanism 
or evolve new function e.g. those with PRAI and IGPS. 

See Annex 1 below for further discussion of evolution of new 
catalytic activities. 

As described in the experimental section belo*w, the 
inventors proved the principle of the invention by 
converting an a/p-barrel protein indoleglycerolphosphate 
synthase (IGPS) into phosphoribosylanthranilate isomerase 
(PRAI) . The resultant enzyme is of similar catalytic 
activity to the naturally occurring enzyme, and, at low 
substrate concentrations, is even more active. 

Combinatorial design 

The invention thus provides a general procedure for 
producing new enzymes, employing what may be termed 
combinatorial design. 

The invention generally provides for design and production 
of an enzyme that catalyses a desired reaction on a desired, 
or target, substrate. 

In one approach according to the invention, a %arrel binding 
the desired substrate is selected or provided, either by 
choosing a naturally occurring barrel which bin<jis the 
substrate or by mutating and selecting another barrel. Such 
selection will generally involve determining ability of a 
barrel to bind the target substrate, and may employ any 
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technique available in the art, for instance phage or 
ribosome display. See e.g. Fersht, supra, chapter 14. 

A lid, based ,on the template of a lid for an a/p-barrel that 
5 catalyses the desired reaction or a reaction of the desired 
type, is grafted on to or engineered into the barrel that 
binds the substrate, to combine a binding site for the 
target substrate with a catalytic template. 

10 The lid is then subjected to targeted mutation and 

selection. Rules and guidance for this are provided below. 

Both lid and substrate binding sites may be subjected to 
mutation and selection to alter or optimise respective 
15 properties, e.g. one or more of binding affinity and 
catalytic activity. 



Transplantation of Class I and Class II lids 
Examination of the classes identified herein leads to 
recognition of where the catalytic groups are and so which 
should be or should preferably be transplanted. 



20 



Summary of location of catalytic loops 



25 Class I 



Catalytic groups are mainly in the p6-a6 loops; some 
catalytic groups are in pi-al and p2-a2 loops. (The 06-a6 
loop connects strand P6 and helix al, etc). 



30 



Class II \, 

Catalytic groups are mainly in the pl-al and p6-a6 connecting 

loops and the N-terminal extension and C-terminal extension. 
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A Class I lid that catalyses a particular reaction may be 
grafted onto a Class II scaffold as follows: 

the N-terirdnal extension of the Class II scaffold is 

deleted; 

the pi-al loop is shortened; 
the P6-a6 loop is modified. 

A Class II lid that catalyses a particular reaction may be 
grafted onto a Class I scaffold as follows: 

an N-terminal extension is added; 

the pl-al loop is lengthened; 

the P6-a6 loop is modified. 

Choice of substitutions 

Loops may be changed to a consensus sequence found from 
examining a family of a/p-barrels that catalyse the desired 
reactions . 

More detailed practical points to consider 

1. Choice of scaffold for the desired function or catalytic 
activity 

The suitable scaffold is chosen, and this may take into 
account' biochemical and structural analysis, considering any 
one or more of the following: ^ 

Biochemical data for scaffold and reference proteins 

\ 

a) Is the scaffold a monomeric or an oligomeric protein? A 
monomeric protein may be preferred, where available. 
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b) Is there a good expression level in bacteria and is it a 
well-characterised gene? Is it part of a regulon? Is it part 
of a metabolic pathway? Can we use in vivo selection?. 

c) What is known about its function, activity assay, ligands 
(substrates, inhibitors, effectors, metals, etc) 

d) Kinetic characterisation: kinetic parameters, kinetic 
mechanism. 

e) Reaction mechanism. 

f) Role of specific residues from mutagenesis studies 

g) Molecular properties in solution. 

h) Folding studies. 



Structural data of both scaffold and reference proteins. 

a) Primary structure. Sequence alignment, identification of 
orthologous proteins (proteins with the same activity in 
different species), neighbour families (proteins with 
conserved structural or functional patterns) . Consensus 
sequences. Conserved signatures. 

Secondary structure: a/0-barrel family fold. 

b) 3-D structure of enzyme-ligand complex, apoprotein and/or 
holoprotein structure. Detailed description of the active 
site: lid, the binding site and the topology of the 
molecule . 

3-D analysis of both proteins, using the PDB (Protein Data 
bank) , ^ 

* 

CATH (see above) (Thorton, supra), FSSP (Fold classification 
based on Structure-Structure alignment of Proteins) (L. Holm 
and C. Sander. Mapping the protein universe. Science 
273:595-602 (1996). The FSSP database is based on exhaustive 
all-against-all 3D structure comparison of protein 
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structures currently in the Protein Data Bank (PDB) . The 
classification' and alignments are automatically maintained 
and continuously updated using the Dali search engine. See 
more details in .http://www2.ebi-ac.uk/dali/fssp/. 

5 

2 . Design 

The following provides guidance for embodiments of the 
present invention . 

10 

a) Active site lid. Based on the active site lid 
classification provided herein, firstly identify the class 
to which the lid of the desired protein belongs. How many 
components are part of the lid? a practical rule consists 

15 in focusing on the N-extra terminal segment and the loops 
pi-al, P6-a6, (33a3 (looking for extra-domains) . When 
fragments of the loops p7-ct7, p5-a5 are part of the lid, 
this means that the template of the metal binding site is 
involved in catalysis. Use CATH database to get the 

20 topology of your protein. See more details in 
http : //tops . ebi . ac . uk/tops/ExplainDetailed . html 

In general, there is a correlation between the length of the 
N-extra amino terminal segment and the length of the loop 

25 pl-al, i.e., both are short or long. In the lid class I, 

the leading structural feature of the active site lid is the 
connection p6-a6, which is rich in glycine residues. In 
the lid class II there are at least three main ^components : 
N-extra amino terminal segment, the loop al-pi, the loop p6- 

30 a6 and sometimes in addition there are f ragments\ of loops 

p3-a3, p7-a7, P5-a5 or the C-end segment. The next step may 
be identification of the residues involved in catalysis, 
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which are usually localised in the lid (Altamirano and 
Fersht, supra) . Further, the lid plays an additional role in 
substrate discrimination because the size of the ligands is 
related to the class lid (Fersht supra) . Finally, the 
conserved features in the lid within different members of 
the family (Altamirano, et. al, submitted) may be identified 
using FSSP program (see above) 

b) Constant regions. Attention focuses on the binding site. 
We identify the polar region and the most hydrophobic area 
in the active site. The polar region commonly appears 
localised between the loops P7-a7 and |38-a8 (the phosphate 
binding site) while the hydrophobic area is localised 
between P2-a2, P4-a4, p3-a3. 

We identify the residues directly involved in substrate 
interaction and the conserved features within different 
members of the family. Now we are ready to the superposition 
of both structures and other neighbouring structures. See 
the next section. 



Thus, the binding site of the a/p-barrel family can be 
divided into three regions in order to locate and modify the 
sections of the protein involved in catalysis and binding in 
accordance with the present invention: 
I . Active site lid - the primary determinant of the 
chemical reaction that is catalysed 

This consists of the loops pial and p6a6, ttte extra N- 
terminal region and the carboxyl terminus. The lids are 
divided into two classes. \ 

Class 1 lid : plal, the extra N-terminal region and the 
carboxyl terminus are characteristically shorter than Class 
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2 lid components. The P6a6 loop often has a distinctive 
sequence composition that is rich in glycine residues. 
Class 2 lid : pial, the extra N- terminal region and the 
carboxyl terminus are characteristically longer than Class 1 
lids. The P6a6 loop tends to be longer than a class 1 lid 
component. Class 2 lids are more abundant, and their 
structures are more adaptable. 

The active site lid dictates the nature of the reaction 
catalysed. 

2. Hinge region 

The hinge region consists of the last two residues of each 
p-strand and the first two residues of each of the 
associated loops. They can have residues that are involved 
in catalysis and binding. 

3. Body loops - important in specificity 

Loops p2a2 and P4a4 bind the hydrophobic regions of the 
substrate . 

Loops p7a7 and P8a8 bind the charged regions of the 
substrate. 

Strands p3, p5 and P8 can contain the metal binding sites. 
Loop p3a3 may be recruited into the hydrophobic binding 
site . 

3. 3-D superposition analysis. 

First, focus attention on the scaffold. Note the barrel 
shape and the segments having a counterpart in^the other 
protein. The next step consists in the identification of 
segments that may overlap, with a r.m.s.d. of 2-3 A. 

Finally, focus attention on the segments with more than five 
residues that cannot be structurally aligned. Identify all 
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the insertions and deletions or any other drastic changes in 
the secondary structure. Identify the segments that can be 
aligned by joining each insertion or deletion. Use FSSP 
database {see supra) 

5 

1 Select the segments that have more than five residues that 
cannot be structurally aligned and use them as target 
points . 

10 Analyse these structural data on the light of the study of 
their functions. Is the active-site lid the target? If 
so, then use data about reaction mechanism, -catalytic 
residues, the structural components of the lid and the lid 
class. Is the binding site part of the "constant regions" in 

15 the target? If so, use data about interaction with ligands, 
affinity constants, stereochemical constraints, etc. 

4. Design the modification of scaffold. 

20 Select the segments on which you will graft the lid or the 
binding site by insertion, deletion or random target 
mutagenesis. Concentrate on the segments chosen as pivots 
(joint points) of the segment or the segment to be deleted. 

25 To make an insertion, choose the random mutation carefully 
and the conserved seguence, introduce the superfamily 
consensus sequences (Conserved residues among the evolution 
in different species) . Design a set of synthetic DNA 
fragments of the target points from diverse species. The 

30 scaffold is now ready for fitting its shape a^id its 
function . 
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Outline of a procedure in accordance with the present 
invention 

Step 1. 

Provision of a scaffold including binding site for substrate 
Case A 

If a known a/p-barrel has the desired binding site for the 
substrate, then employ this. In this case, a lid will be 
chosen from another a/p-barrel that catalyses the desired 
reaction, or a similar reaction, one of the same type. 

Case B 

If there is no known a/p-barrel with the desired binding 
site for the substrate, then a scaffold is chosen that 
catalyses the desired reaction with a similar substrate. 
That is, a scaffold is chosen that catalyses the desired 
reaction and has some features in its binding site that may 
be adjusted for binding the desired substrate (e.g. its 
hydrophobic or charged regions) . In this case the scaffold 
will be mutated (see below) and a variant which binds the 
substrate will be selected (see below) . 

Step 2 

Selection of targets for mutagenesis from superposition of 
3-D structures. 

There are two major components in the scaffold^ one mainly 
for the binding site and one mainly for the reaction 
mechanism. There are three regions that can be ^modified: 
the hydrophobic and the polar parts of the substrate binding 
site; and the catalytic lid. 
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Case A 

(Where a scaffold for binding the substrate is known in a 
protein, and there is another protein known that catalyses 
the desired or a similar reaction) . 

5 

The substrate binding scaffold is used and an appropriate 
template catalytic lid is grafted on. 

For the choice of lid for the reaction mechanism, conserved 
10 features in the superfamily may be examined and superposed 
with those of the binding scaffold. 

Case B 

(Where a protein is known that catalyses the appropriate 
15 reaction on a different substrate that has similarities to 
the desired substrate.) 

For the binding site, conserved features in the superfamily 
may be examined, and superposed with those of the binding 
20 scaffold. 

In either of Case A and Case B, target residues for 
mutagenesis (by insertion, deletion or introduction of 
consensus sequences) may be chosen as segments of five or 
25 more residues that can not be structurally aligned with the 
consensus of those from the superfamily. 

Step 3 ^ 
Mutagenesis and selection 
30 Convenient methods for mutagenesis, sexual recombination and 
selection of active protein are available in the art, and 
some are described below. These generally involve design 
and preparation of synthetic DNA fragments for creating 
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further diversity in the target sequences. The shape of the 
barrel may be refined for improving its function by in vitro 
evolution methods. 

Each of Cases A and B have been exemplified experimentally, 
as describe in more detail below. Further, brief details 
and discussion are provided here. 



CASE A 

(Use of a preexisting binding site and grafting a template 
active-site lid, which is modified by insertions, deletions 
and/or recombination) . 



Step 1 

Scaffold selection 

A monomeric a/p-barrel protein, the indole-3-glycerol- 
phosphate synthase (IGPS), was chosen as a scaffold able to 
bind the desired substrate. 

The desired enzyme activity was that of phosphoribosyl 
anthranilate isomerase (PRAI) . 

Selection system 

An in vivo selection strategy for PRAI activity was designed 
based on complementation of E. coli JA300 (a PRAI-def icient 
strain that does not grow in the absence of tryptophan 
(Trp)). In E. coii, PRAI and IGPS are part of the same 45 
kDa polypeptide chain specified by the trpC g^te. However, 
E. coli JA300 carries the W3110 (trpClll 7) allele and so 
lacks isomerase activity, but retains normal levels of 
synthase activity. Complementation provides indication that 
the specific clone contains a plasmid expressing an IGPS 
variant with PRAI activity. 
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Step 2 

3D superposition 
5 The structures of IGPS and PRAI were superimposed using the 
program SETOR 

Scaffold 

All p-strand residues of the central p-barrel of PRAI have 
10 counterparts in the IGPS. The 68% of the a-helical residues 
have structurally equivalent residues in the other domain. 

Active site lid class 

The IGPS active site is covered by the N-terminal aO helix, 
15 and by the pl-al (15 residues), p2-a2 (9 residues) and P6~ct6 
(11 residues) loops, all located at the C-terminal side of 
the a/p-barrel. This defines the IGPS protein as having a 
class II active site lid. 

20 PRAI, however, has an very different active site lid that is 
mainly formed by the p2-a2 (10 residues) , P6-a6 (11 residues) 
and p8-a8 (12 residues) loops. PRAI has a class I active 
site lid. 

25 Constant regions in the active site 

The P2-a2 loop in both enzymes is involved in binding the 
anthranilate moiety of the respective substitutes PRA and 
CdRP. The P8-ct8 loop comprises the phosphate binding site. 
The superposition of the two structures reveals almost 

30 identical locations but different orientations of the 

phosphate binding site. Since both loops (p2-a2, p7-a7 and 
p8-a8) are similarly arranged in the two enzymes, the target 
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of our selection was solely the extra N-terminal end (helix 
aO and two bends) ', the pi-al loops and the p6-a6 loops. 

Active site lid' as the target for switching reaction 
mechanism 

The first step was grafting a PRAI lid on to a IGPS scaffold 
that contains a common binding site. The process included 
the deletion of 48 amino acid residues from the amino 
terminal end of IGPS; this deletion mutant was called 
(IGPS49). The IGPS49 scaffold was further modified by 
replacing 15 amino acid residues corresponding to the pi-al 
loop by a new randomised segments of 4 to 7 amino acid 
residues. The gene encoding IGPS49 was used as template to 
create three new libraries IGPS49L1 (GKXXG) , IGPS49L1RGD 
(GKXRGD) and IGPS49L1SV (length size variation: GKXX, GKXXX, 
GKXXXX or GKXXXXX) via PCR methodologies including overlap 
extension PCR, inverse PCR and random primer PCR. 

The next set of modifications involved the p6a6 loop, 
including the introduction of an aspartic residue at 
position 184 (acting as a general base in the active site) 
and also the PRAI consensus sequence GXGGXGQ2 1 , with the aim 
of improving the active site lid. A new library called 
IGPS49L1L6 was constructed using the IGPS4 9L1, IGPS49L1RGD 
and IGPS49L1SV libraries as templates. 

In vitro recombination to improve the fit of the barrel 
shape and its function followed by in vivo selection 
In this phase, a first round of DNA shuffling was performed 
with the pool of genes from the selected clones, that were 
able to grow at very low concentration of Trp. A second 
round of recombination was performed by DNA shuffling and 
Staggered extension procedure (StEP) , using the pool of 80 
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colonies selected from the first round and synthetic DNA 
fragments encoding for the protein segments corresponding to 
loops plai, p6a6, P4a4 from diverse species of PRAI. The in 
vivo selection yielded 360 colonies capable of growing in 
the absence of any exogenous Trp. 

In vitro-evolved PRAI 

The newly evolved phosphoribosylanthranilate isomerase has 
similar catalytic properties to the natural enzyme, with an 
even higher specificity constant. 



CASE B 

A scaffold containing a catalytic lid was selected and 
changes made in the binding site (constant pieces) . 

Step 1 

Scaffold selection 

Human Phosphotriesterase homology protein (PHP) was chosen 
as a scaffold. It binds the substrate for the desired 
enzymatic activities . 

The desired enzymatic activities were phosphotriesterase 
(PTE) activity and phosphodiesterase (PDE) activity. 

PHP does not have a known enzymatic activity, though it has 
28% sequence identity with phosphotriesterase, is monomeric 
and binds two zinc ions per monomer • Unlike 

% 

phosphotriesterase, PHP does not catalyse either the 
hydrolysis of nonspecific phosphotriesters or 
phosphodiesters (promiscuous activity in PTE)^. 
Phosphotriesterase is an enzyme capable of hydrolysing both 
widely employed pesticides and phosphof luoridates . 
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Step 2 

Sequence Alignment 

Phosphodiesterase, PHP (E. coli) , PHP (M. pneumoniae), PHP 
(M. Tuberculosis) , PHP (mouse) and PHP (human) are 27-30% 
identical in amino acid sequence. The aspartate and all 
four histidine residues that coordinate .Zn in 
phosphotriesterase are conserved across the six PHP 
proteins. Only the carbamylated lysine at position 169 is 
not strictly conserved. This residue is replaced by a 
glutamate and is shifted by one position in the alignment 
for ePHP, muPHP, rPHP, hPHP . 

3-D superposition 

The structures of PHP from E. coli and PTE were superimposed 
using the program DALI . 

Scaffold 

All p-strand residues of the central p-barrel of PHP have 
counterparts in the PTE. More than 70% of the a-helical 
residues have structurally equivalent residues in the other 
domain. 

Active site lid class 

The PTE active site is covered by the N-terminal (residues 
35-51, including two strands of antiparallel p-sheet), the 
Pl-al (residues 56-76, including p-sheet, turns and helical 
turn), p6-a6 (residues 229-237) loops and a segment of P7-a7 
(only residues 254-256) all located at the C-terminal side 
of the p-barrel. The lid class is II. \ 
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PHP has slightly different active site lid, the N-terminal 



the pi-al (18 residues, that encompasses antiparallel p- 
strands, residues 17-32), P6-a6 (11 residues, is quite 
5 similar in both proteins) • PHP has class II active site lid. 

Constant regions in the active site 

Importantly, significant differences between the two 
structures are found in the regions corresponding to the 

10 binding site of the PHP. The p3-a3 loop is involved in 

binding the substrate with hydrophobic and smaller leaving 
groups such as ethoxy groups in both proteins. In PTE, the 
P7-a7 loop has an insertion of 14 residues, and the P8~a8 
loop has an insertion of 8 residues with respect to the PHP 

15 sequence. These bind the phosphorus centre and are involved 
in binding the substrate with hydrophobic large and bulkier 
leaving group such methylbenzyl group . The superposition of 
the two structures reveals almost identical locations for 
the residues involved in metal ligation. Since the lids 

20 including the metal binding site are similarly arranged in 

the two enzymes, the target of the selection were a fragment 
of the loop p7-a7 (residues 260-276) and all the p8-a8 loop. 

Constant pieces as the target for switching specificity 
25 The first step in the design was grafting a template of the 
PTE substrate binding site on to a PHP scaffold by insertion 
of 18 amino acid residues in the loop p7-a7 a of PHP. The PHP 



(+ 18 residues) scaffold is further modified by inserting 8 
amino acid residues corresponding to the p8-a8 loop by a new 
30 randomised segments via PCR methodologies including overlap 
extension PCR, inverse PCR and random primer PCR. 



segment is shorter (8 residues) . The lid is mainly formed by 
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The binding depends significantly on the relative size and 
orientation of the two subsites that accommodate the 
coordination of the alkyl or aryl substituents within the 
enzyme active site. Using in vitro evolution methods, the 
present invention enables redesign of the active site to 
alter and enhance the substrate specificity of the new 
evolved PTE. 

Evolving phosphodiesterase activity in PHP. 

The full negative charge within the phosphodiester substrate 
is thought to be primarily responsible for the slow rate of 
catalytic hydrolysis of these compounds by the PTE. The 
active site of the PTE is largely hydrophobic, and thus it 
would not be expected to accommodate the negative charge on 
the substrate very well. Further, the nucleophile in the 
active site (metal-bound hydroxide ion) may not be able to 
attack the anionic substrate effectively. In order to 
evolve phosphodiesterase activity, we include a set of 
modifications: the insertion of the IGPS phosphate binding 
site corresponding to the p7a7 and p8-a8 loop. This new 
binding site is able to accommodate the negative charge on 
the substrate. 

In vitro recombination to improve the fit of the barrel 
shape and its function followed by selection, in vivo or in 
vitro. 

The in vivo screening system employs expressicli of the 
protein in the periplasm and using the strong yellow colour 
or display strong fluorescence produced by the (hydrolysis of 
the substrate (Paroxon or Diisopropyl f luorophosphate) . The 
clones with PTE activity become yellow or with fluorescence. 
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Summary of primary grafting rules 

Active site lid - to direct the chemical mechanism 

1. The sizes and composition of the lid components (Plal 

and p6a6, the extra N-terminal region and the carboxyl 
5 terminus) are grouped according to the Class 1 or Class 2 
size and composition categories. 

2. The sequences of the lid components in orthologous 
enzymes that catalyse the desired reaction are examined and 
consensus sequences or conserved residues identified to be 

10 included in the template loops that are transplanted. 

3. The size of the cavity covered by the lid- may be 
increased or decreased by altering the sizes of the side 
chains • 

Hinges 

15 The hinge regions of loops may be included with the loops 
that are transplanted into the scaffold because they may 
have important residues . 

Body loops - to tailor the substrate specificity 

1. The sequences of body loops of known a/p-barrel 

20 proteins that bind the desired substrate are examined and 

consensus sequences and residues included in the loops to be 
transplanted. 

2. If the desired substrate is not bound by other known 
enzymes, then the proteins that bind the closest examples 

25 are preferably used as models. The modifications to the 

loops to accommodate the substrate can be based on the size 
of the hydrophobic and charged moieties of the desired 
substrate relative to known examples using trie principle 

that loops P2a2 and p4a4 bind the hydrophobic regions of the 



30 substrate and p7a7 and p8<x8 bind the charged region. The 
body loops may also be tailored to accommodate polar 
substrate residues in the hydrophobic site and hydrophobic 
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residues in the charged site. The size of the hydrophobic 
site may increased or decreased according to the size of the 
substrate. Modifications may be made to loop ct3|J3 to 
compensate for changes in the size of P2a2 and 04<x4 . 
3. If the substrate is greatly different from any known 
example, then substructures of the substrate may be 
identified (e.g., aromatic rings, nucleosides, sugar rings, 
phosphate groups or aliphatic side chains) and then the 
loops from known proteins that bind these substructures can 
be recruited. It is most useful to choose proteins that 
bind more than one of these substructures simultaneously. 
Creation of diversity 

It is desirable to create diversity in the loops and 
segments that are grafted by using deletions and insertions 
and substitutions of sequences that can be found from 
examination of naturally occurring in orthologous families. 



EXPERIMENTAL 



SECTION 1 

CLASSIFICATION OF a/$-BARREL PROTEIN LIDS AND IMPLICATIONS 
FOR ENZYME DESIGN 

In this first section, combinatorial design principles in 
a/B-barrel proteins for the creation of novel biocatalysts 

* 

are described. 

The a/p-barrel motif is Nature's favourite fold for the 
generation of enzymatic activity. Nature appears to have 
evolved a structural framework enabling the rap^id evolution 
of active sites, the understanding of which facilitates the 
design of new proteins in vitro. There are two constant 
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features in the active sites of many a/p-barrels, which 
differ only in detail: a binding site for the phosphate or 
any other charged group of the substrate; and a hydrophobic 
binding site . 'Mutation of these lead to changes in substrate 
5 binding. Between the constant features are variable regions 
that contain most of the catalytic residues, the "covering 
lids". The inventors here categorise the a/p-barrel domains 
into two classes/ according to the overall template 
structure of the lids, and indicate that the template of the 
10 lid dictates the type of reaction mechanism. The 

combinatorial association of lids and constant binding 
regions coupled with mutation and selection provides a basis 
for generation of new enzymatic activities in vitro, as is 
proven in the experimental example in Section 2 below. 

15 

The a/p or TIM (triosephosphate isomerase) barrel is the 
most common motif in enzyme structure and is the basic 
scaffold of enzymes catalysing a wide variety of reactions 
(Farber & Petsko TIBS 15, 228-234 (1990); Murzin et al. J. 

20 Mol. Biol. 247, 536-540 (1995); Reardon & Farber FASEB J. 
9, 497-503 (1995); Holm & Sander Nucleic Acids Res. 24, 
206-209 (1996); Chothia & Lesk Conformations for strand 
entry into parallel p-sheets pp49-58 (1991) . In Molecular 
Conformation and Biological Interactions. Ed. Balaram P and 

25 Ramaseshan, S. Indian Academy of Sciences, Bangalore* . The 
basic framework consists of at least 200 residues arranged 
in eight parallel p-strands connected and surrounded by 
eight helices, with a central hydrophobic core. The a/p- 
barrel enzymes have a variety of quaternary arrangements and 

30 show little or no homology, except for those \:hat catalyse 
the same reactions in different organisms. Nevertheless, 
their active site is always in the same region of the 
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protein, at the C-terminus, and is formed by the eight loops 
connecting the carboxy end of each strand with the amino end 
of the following helix (Lesk et al . Proteins 5, 139-148 
(1989); Murzin et al. J. Mol. Biol. 236, 1382-1400 (1994); 
Murzin et al. J. Mol. Biol. 236, 1369-1381 (1994)'. The 
connections between the C-termini of helices and strands 
usually involves short loops, whereas those from strands to 
U-termini are long and provide a structural basis for 
binding and catalytic sites. Most of the catalytic residues 
in the a/p-barrel enzymes appear in these loops, which form 
a covering lid over the site, shielding it from solvent 
(Branden Curr. Opin. Struct. Biol. 1, 978-983- (1991); 
Branden and Tooze Introduction to Protein Structure, 2nd. 
Edition (Garland Publishing Inc., New York, 1999)'. There 
are two constant features in the barrel: a hydrophobic 
region that binds part of the substrate; and a phosphate 
binding site, which may be modified to bind other charged 
groups, such as metal ions. The a/p-barrel fold has been 
extensively analysed from an evolutionary perspective. 
Farber et al., (Farber & Petsko TIBS 15, 228-234 (1990) ; 
Reardon & Farber FASEB J. 9, 497-503 (1995) 1 , based on 
mainly on structural criteria but also on function divided 
the a/p-barrel proteins into six structural families (A-F) . 
Chothia and colleagues noted that the pattern of packing 
inside the p-barrel of glycolate oxidase and ribulose-1, 5- 
biphosphate carboxylase oxygenase (rubisco) is similar and 
differs from that inside the barrel of triosephosphate 
isomerase, which has the most asymmetric cross" section and 
is very distorted (Lesk et al . Proteins 5, 139-148 (1989); 
Murzin et al. J. Mol. Biol. 236, 1382-1400 (199,4); Murzin et 
al. J. Mol. Biol. 236, 1369-1381 (1994)'. Petsko (Neidhart 
et al. Nature 347, 692-694 (1990); Neidhart et al. 
Biochemical Society Symposia 57, 135-141 (1990) > and Branden 
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(Branden Curr. Opin. Struct. Biol. 1, 978-983 (1991); 
Branden and Tooze Introduction to Protein Structure, 2nd. 
Edition (Garland Publishing Inc.., New York, 1999) 1 analysed 
two sets of evolutionary related enzymes that perform 
different biological functions (mandelate racemase and 
muconate lactonising enzyme, and glycolate oxidase, 
f lavocytochrome b 2 and mandelate dehydrogenase) . They 
suggested that the proteins could have evolved by divergent 
evolution by retaining the chemical mechanism but with 
mutations in the barrel leading to different specificities. 



From a survey of the a/p-barrel domains in -the SCOP, CATCH 
and Dali databases (Murzin et al . J. Mol. Biol. 241, 536- 
540 (1995); Orengo et al . Structure 5, 1093-1108 (1997); 

15 Holm & Sander Nucleic Acids Res. 22, 3600-3609 (1994); Holm 
& Sander TIBS 20, 478-480 (1995); Hubbard et al. Acta" 
Crystallogr D Biol Crystallogr 54, 1147-1154 (1998)) the 
inventors now provide two broad classes into which these 
proteins can be categorised according to the structural 

20 design of the covering lid. The lid contains most of the 

catalytic residues, and so understanding the design of the 
lid is a key step in designing novel activities based on the 
a/p-barrel scaffold. In particular, this allows for 
mutation, recombination and alteration of the lid while 

25 retaining a substrate binding site, thereby altering the 
reaction catalysed by the enzyme on the bound substrate. 

The classification is based on the structure^ of 
phosphoribosylanthranilate isomerase (PRAI) and indole-3- 
30 glycerol-phosphate synthase (IGPS) as models pable I and 
Table II, Figure 1), as has been described already above. 
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The inventors have realised that the structure of the active 
site lid appears to dictate the type of reaction mechanism 
(Table III) . For example, triosephosphate isomerase and 
xylose isomerase both catalyse aldose-ketose isomerisations 
of different substrates (Banerjee et al. Protein Engineering 
8, 1189-1195 (1995); Farber et al. Biochemistry 28, 7289- 
7297 (1989) } . The first enzyme belongs to class I and uses a 
proton-transfer mechanism. The second one (Class II) has a 
hydride transfer mechanism. In an enzyme family that 
catalyses the same reaction by the same mechanism but for 
different substrates, the classification of the lid remains 
the same, but the lids vary in length and sequence to 
generate the different specificities (Table III) . For 
example, aldol-ketol isomerisations in TIM-like aldol-ketol 
isomerases are mechanistically similar to 2- 
hydroxyaldimine-ketoamine isomerisations (the Amadori 
rearrangement) in PRAI . In both cases, general-base 
catalysed proton abstraction and repositioning occur, 
although the reaction intermediates are different. Both 
enzymes belong to class I (Table I and III) . The metal- 
dependent hydrolase superfamily is another example of this 
(Gerlt & Babbitt et al . Curr Opin Chem Biol 2, 607-612 
(1998). This family uses a dozen different substrates and is 
responsible for seven of some 20 steps along four important 
metabolic pathways (Holm & Sander Proteins 28, 72-82 (1997) . 
They have a common reaction mechanism, the metal ion (or 
ions) activate a water molecule for nucleophilic attack to 
the substrate (Wilson et al. Biochemistry 32, ^659-1694 
(1992); Hong & Raushel Biochemistry 35, 10904-10911 (1996); 
Volbeda et al. Curr. Opin. Struct. Biol 6, 804-^)12; 0' Brien 
& Herschlag Chemistry & Biology 6 (1999), and they are all 
in Class II (Tables II and III) . 
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Changes in residue spacing plays a major role in evolution 
of protein function, with insertions and deletions 
contributing substantially to the diversification of enzyme 
activities. At one level in the a/p-barrel family, such 
changes can lead to changes in specificity although 
retaining membership of class I or II. An example is the 
enolase superfamily (Class II) (Gerlt & Babbitt et al. Curr 
Opin Chem Biol 2, 607-612 (1998)' O' Brien & Herschlag 
Chemistry & Biology 6 (1999)'. During evolution, they have 
retained the structural strategy of catalysing the 
chemically difficult step of a-proton abstraction but they 
gained additional functional groups to catalyse different 
overall reactions (Gerlt & Babbitt et al. Curr Opin Chem 
Biol 2, 607-612 (1998)' Gulick et al. Biochemistry 37, 14358- 
14368 (1998) ». Further, more radical, changes can lead to 
the change of lid design, accompanied by a change in class 
and a change in mechanism or evolve new function e.g. those 
with PRAI and IGPS (Hommel et al. Biochemistry 34, 5429-5439 
(1995); Darimont et al . Protein Science 7, 1221-1232 
(1998) ' . 

The two classes may be further subdivided on basis of their 
catalytic mechanism (Table III) . Class II barrels, for 
example, may also be divided into several families, 
following the criteria used in the SCOP database (Table II) 
(Murzin et al. J. Mol. Biol. 247, 536-540 (1995)». Some of 
our class II barrels may be readily subdivided into some of 
Farber's categories: groups A, D, E and F ffe the IGPS 
group. There is also a correlation between our categories 
and the description of the p-barrels of Chothia and et al. 
(Murzin et al . J. Mol. Biol. 236, 1369-1381 (1994)) based on 
packing: our class I corresponds to the distorted TIM 
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barrel, and the class II encompasses glycolate oxidase and 
rubisco. 

Thus, Nature may have used a three-fold combinatorial 
5 strategy for evolving new catalytic activities from pre- 
existing a/p-barrel enzymes: retention of mechanism for the 
rate determining step but mutation of the binding 
specificity (e.g. the formation of the enolate intermediate 
in the enolase superfamily Neidhart et al. Nature 347, 692- 

10 694 (1990) and Neidhart et al . Biochemical Society Symposia 
57, 135-141 (1990)); retention of binding specificity but 
radical mutation of the lid by insertions, deletions and 
recombination to change the reaction or its mechanism (e.g. 
class I and II aldolases, TIM and Xylose isomerase, PRAI and 

15 IGPS Gerlt & Babbitt et al . Curr Opin Chem Biol 2, 607-612 

(1998) and O' Brien & Herschlag Chemistry & Biology 6 (1999)); 
and more general changes in the binding site that allow the 
catalysis of a variety of different reactions with similar 
mechanisms, such as in the superfamily of the metal- 

20 dependent hydrolases ( Gerlt & Babbitt et al. Curr Opin Chem 
Biol 2, 607-612 (1998); Holm & Sander Proteins 28, 72-82 
(1997)>. 

In view of these observations, the inventors now provide 
25 practical guidance for the design of new proteins, based on 
a/p-barrels as scaffolds. Once the type of lid Nature uses 
for catalysing a particular type of reaction is known, such 
a lid can be used as a template for catalysin^further 
examples of that type of reaction by grafting it onto an 
30 a/p-barrel of known binding site. As explained ^lready, this 
provides for a general strategy for evolving a new function 
in an a/p-barrel scaffold using a combinatorial approach: a 
reaction-specific lid is combined with a substrate-specific 
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binding barrel and subjected to mutation and selection. This 
approach is particularly well suited for the manipulation of 
successive enzymes in biosynthetic pathways since the 
product of one enzyme is the substrate for the next so they 
both have a common substrate binding site. As described in 
the following experimental Section 2, this strategy was 
successfully used to evolve in vitro a new function in the 
a/3-barrel of indole-3-glycerol phosphate synthase and 

» 

create a novel phosphoribosylanthranilate isomerase of 
activity comparable to that of the natural enzyme. 

SECTION 2 

PROVISION OF A NEW ENZYME USING AN a/fl- BARREL SCAFFOLD 

m 

Phosphoribosylanthranilate isomerase (PRAI) activity was 
evolved from the scaffold of indole-3-glycerol-phosphate 
synthase (IGPS) by combining a preexisting binding site for 
structural elements of phosphoribosylanthranilate with a 
catalytic template required for the isomerase activity. The 
template was targeted for in vitro mutagenesis and 
recombination, followed by in vivo selection. The newly 
evolved phosphoribosylanthranilate isomerase has similar 
catalytic properties to the natural enzyme, with an even 
higher specificity constant. 

IGPS and PRAI form two covalently linked domains of a 
bifunctional enzyme in Escherichia coli that catalyses two 
consecutive steps in the tryptophan biosynthesis pathway 12 
(Figure 2) . The enzymes have a sequence identity of 22% and 
share a common ligand: carboxyphenylamino-l-deoxy-ribulose 
5-P (CdRP), which is the product of PRAI and the substrate 
of IGPS. There are considerable structural differences 
between them: IGPS does not isomerise PRA, and PRAI does not 
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* 

catalyse the formation of the indole ring (Orengo et al . 
Structure 5, 1093-1108 (1997). 14., Holm & Sander Nucleic 
Acids Res. 22, 3600-3609 (1994). 15., Holm & Sander TIBS 

20, 478-480 (1995) ) . 

* 

5 

Design strategy 

There are many methods for generating diversity in a target 
gene (Arnold & Volkov Curr Opin Chem Biol 3, 54-59 (1999); 
Stemmer Proc. Natl. Acad. Sci. USA 91, 10747-10751 (1994); 
10 Stemmer Nature 370, 389-391 (1994); Zhao & Arnold Nuclei 
Acids Res. 25, 1307-1308 (1997); Shao et al . Nuclei Acids 
Res. 26, 681-683 (1998); 

Giver & Arnold Curr Opin Chem Biol. 2, 335-338 (1998); Zhao 
et al. Nat. Biotechnol 16, 258-261 (1998)). However the 

15 generation of mutants must be coupled to a suitable 

selection procedure for in vitro evolution (Arnold & Volkov 
Curr Opin Chem Biol 3, 54-59 (1999); Crameri et al. Nat. 
Biotechnol 14, 315-319 (1996); Crameri et al . Nat. Medicine 
2, 100-102 (1996); Crameri et al. Nature 391, 288-291 

20 (1998); Tawfik & Griffiths Nat. Biotechnol 16, 652-656 

(1998)). A library encoding just one copy of each possible 
variant for a protein of 250 amino acids (the size of the 
a/p-barrel) would contain 20 250 variants, a number 
constituting a mass far greater than that of known universe 

25 (Kauffman, S. A. (ed.) The origins of order (Oxford 
University Press, New York, 1993)). This constraint 
necessitates both in Nature and in the laboratory the use of 
techniques that target specially selected segments of the 
chosen starting scaffold. That is to say a combination of 

30 rational design and selection in the experimental strategy 
for in vitro evolution. 

The inventors used elements of a pre-existing binding site 
for the phosphate and anthanilate structural motifs. CdRP 
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is the product of PRAI, and so the binding site of PRAI must 
also bind CdRP. IGPS binds CdRP and so the inventors 
reasoned that it has the potential to bind Pra. 

The next component of the design was derived from the 
detailed comparative analysis of structural and biochemical 
data on IGPS and PRAI by Kirschner and coworkers (Kirschner 
et al. Meth. Enzymol. 142, 386-397 (1987); Darimont et al. 
Protein Sci. 7, 1221-1232 (1998); Hommel et al . Biochemistry 
34, 5429-5439 (1995); Wilmanns et al. J. Mol. Biol. 223, 
477-507 (1992); Wilmanns et al . Biochemistry 30, 9161-9169 
(1991); Knochel et al. J. Mol. Biol. 262, 502-515 (1996); 
Luger et al. Science 243, 206-210 (1989); Stehlin et al. 
FEBS Letters 403, 268-272 (1997)). Using this information, 
we superimposed the structures of IGPS and PRAI using the 
program SETOR. From this comparison the active site lid in 
each protein was identified. The IGPS active site is 
covered by the N-terminal a0 helix, and by the pl-al (15 
residues), p2-a2 (9 residues) and 06-a6 (11 residues) loops, 
all located at the C-terminal side of the p-barrel. PRAI, 
however, has a very different active site lid which is 
mainly formed by the p2-a2 (10 residues), p6-a6 (11 residues) 
and p8-a8 (12 residues) loops. The P2-a2 loop is involved in 
binding the anthranilic acid moiety of the substrates PRA 
and CdRP, and the p8-a8 loop comprises the phosphate binding 
site.. The superposition of the two structures reveals almost 
identical locations but different orientations of the 
phosphate binding site. Since the loops (P2-%2, p 7 -a7 and P8- 
a8) are similarly arranged in the two enzymes, the target of 
selection was solely the extra N-terminal end\ (helix a0 and 
two bends), the pl-al loops and the p6-a6 loops. 



WO 01/42432 PCT/GBOO/04661 

52 

The first step in the design included the deletion of 48 
amino acid residues from the amino terminal end of IGPS; 
this deletion mutant was called (IGPS49) . This mutant was 
unstable, had a tendency to aggregate (Stehlin et al. FEBS 
5 Letters 403, 268-272 (1997)) and was catalytically inactive 
with respect to both IGPS and PRAI activities. Nucleic acid 
encoding the IGPS4 9 was expressed in E. coli, and the 
protein formed inclusion bodies. Refolding chromatography 
with immobilised minichaperones was employed to renature the 
10 protein quantitatively (Altamirano et al. Proc. Natl. Acad. 
Sci. USA 94, 3576-3578 (1997)). It had a circular dichroism 
(CD) spectrum characteristic of a native a/p-barrel protein 
and bound 3 H-rCdRP, a specific inhibitor of IGPS, with a 
stoichiometry of one mol of inhibitor per mol of IGPS49. 

15 

The IGPS49 scaffold was further modified by replacing 15 
amino acid residues corresponding to the pl-al loop by a new 
randomised segments of 4 to 7 amino acid residues. Nucleic 
acid encoding IGPS49 was used as template to create three 

20 new libraries IGPS49L1 (GKXXG) , IGPS49L1RGD (GKXRGD) and 
IGPS49L1SV (length size variation: GKXX, GKXXX, GKXXXX or 
GKXXXXX) via PCR methodologies including overlap extension 
PCR, inverse PCR and random primer PCR. The libraries were 
analysed by PCR screening, by restriction analysis and by 

25 sequencing. Members of each library were picked at random 

s 

and expressed in E. coli. The proteins appeared in the 
soluble fraction but were prone to aggregation above a 
concentration of 0.5 mg/mL. One of the proteir^samples was 
denatured in 8 M urea and renatured using refolding 
30 chromatography (Altamirano et al. Proc. Natl. A^ad. Sci. USA 
94, 3576-3578 (1997)). The refolded protein was soluble and 
able to bind 3 H-rCdRP, but it lacked catalytic activity. 



WO 01/42432 PCT/GBOO/04661 

53 

The next set of modifications involved the P6a6 loop, 
including the introduction of an aspartic residue at 
position 184 (acting as a general base in the active site) 
(Darimont et al. Protein Sci. 1, 1221-1232 (1998); Wilmanns 
5 et al. J. Mol. Biol. 223, 477-507 (1992)) and also the PRAI 
consensus sequence GXGGXGQ (Wilmanns et al. J. Mol, Biol. 
223, 477-507 (1992)), with the aim of improving the active 
site lid. A new library including these modifications and 
called IGPS49L1L6 was constructed using the IGPS49L1, 

10 IGPS49L1RGD and IGPS49L1SV libraries as templates. One of 

the new library members chosen at random was expressed in E. 
coli and the corresponding protein was found to be soluble, 
with a circular dichroism spectrum characteristic of a 
typical a/p-barrel protein. Further, it was able to bind the 

15 3 H-rCdRP, but lacked either PRAI or IGPS activity. 

Mutation, recombination and in vivo selection 

An in vivo selection strategy for PRAI activity was 

designed, based on complementation of E. coli JA300 (a PRAI- 

20 deficient strain that does not grow in the absence of 

tryptophan (Trp) , and which is available from ATCC) . In £. 
coli, PRAI and IGPS are part of the same 45 kDa polypeptide 
chain specified by the trpC gene. However, E. coli JA300 
carries the W3110 (trpClll 7) allele and so lacks isomerase 

25 activity, but retains normal levels of synthase activity 
(Clarke Proc. Natl. Acad. Sci. USA 11, 2173-2177 (1980); 
Yanofsky et al. Genetics 69, 409-433 (1971); Yanofsky JAMA 
218, 1026-1035 (1971)). Complementation presides indication 
that the specific clone contains a plasmid expressing an 

30 IGPS variant with PRAI activity. ^ 

JA300 itself, showed no ability to grow in the absence of 
Trp. The initial parental clones (IGPS49, IGPS49L1, 
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IGPS49L1RGD, and IGPS4 9LSV) failed to grow in absence of 
Trp. 

The DNA library IGPS4 9L1L6 was used to transform the JA300 
strain. Approximately 3 x 10 4 E. coli transf ormants 
expressing the resultant library were then plated on minimal 
medium containing a range of tryptophan concentrations (0-25 
Hg/mL) . The colonies (around 500) growing at low Trp 
concentrations were selected. A first round of DNA shuffling 
was performed with the pool of genes from the selected 
clones using the method of Stemmer (Stemmer Proc. Natl. 
Acad. Sci. USA 91, 10747-10751 (1994); Stemmer Nature 370, 
389-391 (1994); Crameri et al. Wat. Biotechnol 14, 315-319 
(1996)). Plating around 4 x 10 5 bacteria on a wide range of 
Trp concentrations yielded 80 colonies. These were able to 
grow at very low concentration of Trp (< 1 fig/mL) and a 
single clone was found to be capable of growing in the 
absence of any exogenous Trp. Restriction-fragment length 
polymorphism (RFLP) analysis of 30 clones chosen at random 
revealed a minimum of 8 different patterns. A second round 
of recombination was performed by DNA shuffling (Stemmer 
Nature 370, 389-391 (1994)) and staggered extension 
procedure (StEP) (Zhao et al. Nat. Biotechnol 16, 258-261 
(1998)), using the pool of 80 colonies selected from the 
first round and synthetic DNA fragments encoding for the 
protein segments corresponding to loops plal, P6a6, p4a4 from 
diverse species of PRAI . The in vivo selection yielded 360 
colonies capable of growing in the absence of Iny exogenous 
Trp. 

■\ 

Several controls were performed in order to show that the 
ability to grow in absence of Trp was a consequence of the 
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introduction of the reshuffled library (IGPS49L1L6-2 cycle) 
containing the ivePRAI (in vitro-evolved PRAI) genes. As a 
first control, the inventors cured the JA300 strain 
previously transformed with the plasmids carrying the 
library by growing the bacteria in the absence of 
ampicillin. The cured cells were unable to grow in 
ampicillin-containing medium and simultaneously lost the 
ability to grow in absence of Trp. Further, the plasmid 
carrying the ivePRAI gene was used to transform fresh JA300 
cells, prior to plating on minimal medium with added 
ampicillin, Streptomycin (Strep) and IPTG but in the 
absence of Trp. These transformed cells were able to grow in 
18 h in the absence of Trp, all the clones were ampicillin 
resistant and were Trp + (see additional controls in the 
Materials and Methods section below) . On the basis of these 
controls, it is believed that the PRAI activity 
complementing the auxotrophy in JA300 cells originates from 
the cloned IGPS variant genes and is not the product of any 
reversion event. 

In vitro-evolved PRAI 

The nucleic acid encoding the ivePRAI proteins from 30 
clones were sequenced. Only. 8 different sequences were 
found. The largest colony from a plate of minimal medium 
without Trp was selected for further biochemical 
characterisation. The gene encoding the ivePRAI was 
expressed and the protein purified. The new protein was 
soluble. The CD spectra and the activity ass%y confirmed 
that was properly folded. 

\ 

The ivePRAI has PRAI activity and does not have IGPS 

activity in vitro. ivePRAI has a specificity constant 

< of 4.8 xlO 7 s' 1 M' 1 (Table I), which is 6-fold higher 
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than that of either the natural enzyme (E. coll wild-type 
bienzyme) or the isolated PRAI domain (Table I) . This 
improved activity results primarily from a 15-fold enhanced 
affinity of the evolved protein for PRA (Table I) . 

The structure of ivePRAI resembles IGPS and differs 
significantly from that of PRAI. The sequence identity for 
ivePRAI to PRAI is 28% and is 90% to IGPS (Figure 3) . 
Importantly, the binding site for the phosphate ion in the 
IGPS scaffold of ivePRAI is at the N-terminal turn of the 
additional a-helix a8' that is located in the loop between 
strand (38 and helix a8. In the wild-type PRAI, the 
additional a-helix a8 1 is missing and the phosphate ion has 
different orientation (Wilmanns et al. J. Mol. Biol, 223, 
477-507 (1992)). Further, the site for binding the 
anthranilate moeity of PRA in ivePRAI is also inherited from 
IGPS and is quite different from that of PRAI. The catalytic 
constants of ivePRAI and PRAI are similar (Table 1) . 

These experiments demonstrate that the two classes of a/p- 
barrels, described above, can be interconverted by altering 
the lid regions. The results demonstrate the divergent 
evolution of two enzymes from the pathway for the 
biosynthesis of tryptophan, which may mimic natural 
divergent evolution (Sterner et al. Protein Science 5, 2000- 
2008 (1996)). For in vitro design purposes, a new function 
in the scaffold of an a/p-barrel protein was provided using 
the combined approach of rational design, in v$*tro mutation, 
recombination and in vivo selection. 

\ 

MATERIALS AND METHODS 



• 1 , 



10 
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Reagents 

Restriction enzymes and T4 DNA ligase were obtained from 
BioLabs. Taq polymerase and Wizard DNA preparation kits were 
obtained from Promega. Ultrapure dNTPS were obtained from 
Boehringer Mannheim. DNase I and other reagents were 
obtained from Sigma. 

Chemical syntheses 

rCdRP and 3 H-rCdRP were prepared as described by Bisswanger 
et al. (Bisswanger et al. Biochemistry 18, 5946-5953 
(1979)). The specific activity of the 3 H-rCdRP was 95.36 
kBq/|amol . 



Preparation of DNA 
15 The gene encoding IGPS (residues 1-259) was amplified from 
E. coli BL21 genomic DNA by PCR (94 °C, 1 min; 37 °C, 1 min; 
72 °C, 1 min; 25 cycles) using primers 'IGPSFULL' and 
'IGPSFLAGREV. The PCR product was digested with Nco I and 
Bsp HI and the 820bp fragment cloned in to the Nco I site of 
20 pNS3785 (Sternberg et al. Proc. Natl. Acad. Sci. USA. 92, 
1609-1613 (1995)) to create pJB122. pJB122 thus encodes a 
polypeptide chain comprised of residues 4 9-259 of IGPS fused 
directly to the Flag-tag GSDYKDDDDK at the C-terminus of 
IGPS. The gene encoding IGPS49 (residues 49-259) was 
25 amplified by PCR from pJB122 using primers 'IGPS49FSP1' and 
'JB122SEQ' and was then digested with Fsp I and Bam HI. 
pJB124 was created by ligation of the 630bp PCR fragment 
with a 4700bp fragment generated from pJB122^by digestion 
with Nco I, blunt-ending with Klenow polymerase, and further 
digestion with Bam HI. The gene encoding IGPS4 9 was used as 
a template for further modifications and recloned in the 
same vector described above, a set of different plasmids 
(pMA) carrying all the libraries were created. 



30 
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01 i gon ucleoti des 

The following oligonucleotides were used. 



IGPSFULL: 

5 » CATGACCTTGCGGCCCAGCCGGCCATGGCGCAAACCGTTTTAGCGAAAATCGTCGC3 1 
IGPSFLAGREV: 

5 1 ATCGTCATAATCATGAACTACTTGTCATCGTCGTCCTTGTAGTCGGATCCTACTTTAT 
TCTCACCCAGCAACACCCGGCGC ACGG 3 1 

IGPS49L1: 

.5 1 NNSNNSNNSGGTGCACGCATTGCCGCCATTTATAAACATTACGC3 1 
IGPS49Lr: 

5 ' ACCGC ACTCC AGAATAAATGCCCTTCC 3 1 
IGPS49FSP1: 

5 1 CATGACCTTGTGCGCATTTATTCTGGAGTGC3 * 
JB122SEQ: 

5 1 CCCTGCGGCTGGTAATGG3 ' 
IGPS49LlL6r: 

5 1 CCCACCSNNGCCGTTGATGCCAACGACCTTTGCCCC3 1 
IGPS(Apall): 

5 1 CGCCGTGCGTGCACCCTGTAGCGC3 1 fe^ 



LI (6aa) : 




L1APAL1 : 
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5 • TTTATTCTGGAGTGCGGTCTANNSNNSNNSGGTGCACGCATTGCCGCC3 1 
LlAPALre: 

5 1 GGCGGCAATGCGTGCACCSNNSNNSNNTAGACCGCACTCCAGAATAAA3 ■ 

5 

L6: 

5 1 GCAAAGGTCGTTGGCATCAACGGCNNSGGTGGGNNSGGTNNSNNSATTGATCTCAACC 
GTACC3 ' 

10 L6rev: 

5 ' GGTACGGTTGAGATCAATSNNSNNACCSNNCCCACCSNNGCCGTTGATGCCAACGACC 
TTTGC3 1 

DNA shuffling 

» 

15 The shuffling of the pool of genes from the first cycle of 
selection was performed using 60 to 80 bp fragments, 
generated by DNase I (Sigma) and reassembled by PCR without 
added primers (Stemmer Proc. Natl. Acad. Sci. USA 91, 10747- 
10751 (1994)). A PCR program of 95 °C, 1 min f 40 cycles (94 

20 °C, 30 s; 55 °C, 30 s; 72 °C, 1 min + 5 sec. per cycle) was 
used. After 40-fold dilution of the minus primer product 
into PCR mix with 1 jiM of each primer and 20 additional 
cycles of PCR (94 °C, 30 s; 55 °C, 30 s; 72 °C, 2 min), a 
single product of 650 bp was obtained. The shuffled material 

25 was cloned back into the vector described above and used to 
transform the PRAI-def icient E. coli strain JA300 (Clarke 
Proc. Natl. Acad. Sci. USA 11, 2173-2177 (1980) ; Yanofsky & 
Horn J. Bacterid 116, 6245-6254 (1994)). %> 



30 The second cycle of shuffling was performed on the pool of 
chimaera selected in the first round and synthetic DNA 
fragments encoding for the protein segments corresponding to 
loops plctl, p6a6, P4a4 from diverse species of PRAI . 
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Staggered extension process (StEP) 

The StEP conditions were performed as described in Zhao et 
al. Nat. Biotechnol 16, 258-261 (1998). A PCR program of 92 
cycles (94 °C, 30 sec; 55 °C, 4 sec) was used. At this step 
the parent DNA (purified from a dam+ strain) was removed 
using Dpn I. A second PCR was performed adding primers in 
order to amplify the full length product (95 °C, 2 min; 25 
cycles (94 °C, 30 sec; 55 °C, 1 min; 72 °C, 5 min) 72 °C, 30 
min) . 

Selection 

JA300 cells were plated on minimal medium (M9) with 
ampicillin (50_jig/mL), streptomycin (20 jig/mL) plus 0.7 mM 
IPTG, containing a range of Trp concentration and incubated 
at 37 °C for 24-36 h. About 500 colonies from the plates 
with the lower Trp levels were pooled and cultured either in 
liquid medium 2X TY + amp + Strep or minimal medium (M9) + 
Amp + Strep + 0.7 mM IPTG with the similar level of Trp. 
Plasmid DNA was prepared from this liquid culture. 

Additional controls experiments: 

Plasmid DNA from the pool of clones selected after the 
second round of recombination was prepared and used DNA to 
transform fresh JA300 cells, prior to plating on minimal 
medium with added ampicillin, streptomycin (Strep) and IPTG 
but in the absence of Trp. These transformed cell were able 
to grow in the absence of Trp in 18 h. Additionally, the 
plasmid DNA from these cells was purified and the insert 
excised by restriction digestion and recloned ir^to a fresh 
vector. After transforming into fresh JA300 cells, positive 
clones were obtained in the absence of Trp, demonstrating 
that the activity was insert dependent. The same result was 
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obtained when the DNA was amplified by PCR, recloned and 
introduced into fresh JA300 cells. 



Refolding chromatography 

Protein renaturation was performed as described in 
Altamirano et al. Proc. Natl. Acad. Sci. USA 94, 3576-3578 
(1997) . 

Protein purification 

After refolding experiments, the proteins (IGPS49, IGPS49L1 
and IGPS49L1L6) were purified as described in Bisswanger et 
al. Biochemistry 18, 5946-5953 (1979). 

PRAI activity Assay 

All the kinetic and binding experiments were performed as 
described in Kirschner et al. Meth. Enzymol. 142, 386-397 
(1987) and Hommel et al. Biochemistry 34, 5429-5439 (1995). 

Sequence alignment 

The amino acid sequences of ivePRAI, IGPS and PRAI were 
aligned using sequence similarity search of SCOP sequences 
based on BLAST algorithm Stephen et al. J. Mol. Biol. 215, 
403-410 (1990). In Figure 3 we show the sequence alignment 
based on ClustalW algorithm (Matrix Blosum 30) . 
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Table IV 



2, 5-PIKETO-D-GLPCONIC ACID REDUCTASE A 



ALCOHOL DEHYDROGENASE (NADP+) 



ALDEHYDE RRDflPTA S E 



3-ALPHA-HYDROXYSTEROID DEHYDROGENASE fB-fiPECIFIC) 



IMP DEHYDROGENASE 



3 - ALPHA-H YDRQXYSTERQID DEHYDROGENASE (A-SPECIFTC) 



L - LACTAT E DEHYDROGENASE f CYTOCHROME ) 



(S)-2-HYDRQXY-ACTD OYTHAgB 



DIHYDROQROTATE OXIDASE 



TRIMETHYLAMINE DEHYDROGENASE 



NADPH DEHYDROGENASE 



5,10-METHYLENETETRA HYDROFQLATE REDUCTASE (FADH) 



ALKANAL MONOOXYGENASE ( FMN- LINKED ) 



TRANSALDOLASE 



CYCLOMALTODEXTRIN GLUCANOTRANSFERA gE 



NICOTINATE-NUCLEOTIDE PY R0PHOSPHORYT.A SE (CARBQXYLATTWR) 



OUEUINE TRNA-RIBOSYLTRANSFERASE 



THIAMIN-PHOSPHATE PYROPHO SPHORYIiASE 



DIHYDROPTEROATE SYNTHASE 



PYRUVATE KINASE 



PYRUVATE , PH OSPHATE DTKTNASE 
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Table IV (Continued) 



1 - PHOS PHATTDYL INOS IT OL PHOSPHODIESTERASE 



1-PHQ5PHATIDYLINOSITOL-4 . 5 - B I S PHOSPHATE PHQS?HQDIEgTEfiAgE; 



ARYLDI ALKYLPHOS PHATASE 



DEOXYRIBONTirr.EASE IV ( PHAGE T4 -INDUCED) 



ALPHA- AMYLASE 



BETA- AMYLASE 



CELLULASE 



ENDO-1 . 4-BETA-XYLANASE 



PL I GO- 1 . 6-GLUCOSIDASE 



BETA-GLUCOSIDASE 



BETA- GALACTOSI DASE 



BETA -GLUCURONIDASE 



ni.UCAN ENDO -1 1-BETA-n-flLUCOSIDASE 



BETA-N-ACETYLHEXOSAMINIPASE 



G LUC AN 1 . 4 - ALPHA -MALTOTETRAHYDRQLASE 



□ 



ISOAMYLASE 



jjICHENINASE 



MANN AN ENDO -1. 4 -BETA-MANNOSIDASE 



6 - PHOS PHO -BETA -G ALACTOS I PA S E 



CELLULOSE 1 . 4 -BETA-CF LLOBIOSIDASE 



flANNOSYL-GLYCOPROTEIN ENDO-BETA -N-ACETYLGLUCQSAMIPAgE 



MFOPULLULANASE 



\ 



THTOGLUCOSIDASE 



UREASE 



ADENOSINE DEAMINASE 
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PHOSPHOENOLP YRPVATE CARBOXYLASE 



, RIBULQSE - BIS PHOSPHATE CARBOXYLASE 



INDOLE- 3 - GLYCEROL- PHOSPHATE SYNTHASE 



FRUCTOSE-BISPHOSPHATE ALDOLASE 



2 - DEHYDR O- 3 -DEOXYPHOSPHOGLUCONATE ALDOLASE 



2 -DEHYDRO- 3 -DEQXYPHOSPHOHEPTONATE ALDOLASE 



N-ACETYLNEURAMINATE LYASE 



PHOSPHOP YRUVATE HYDRATASE 



TRYPTOPHAN SYNTHASE 



PORPHOBILINOGEN SYNTHASE 



GLUCARATE DEHYDRATASE 



DIHYDRODIPICOLINATE SYNTHASE 



ALANINE RACEMASE 



MANDELATE RACEMASE 



RIBULOSE- PHOSPHATE 3-EPIMERASE 



TRIOSE PHOSPHATE I SOMERASE 



XYLOSE ISOMERASE 



PHOSPHORIBOSYLANTHRANI LATE ISOMERAWE 



PHOSPHOENOLPYRUVATE MUTASE 



METHYLMALONYL-COA MUTASE 



MUCONATE CYC L 01 SOMERASE 



CHLOROMUCONATE CYC LO I SOMERASE 
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Table IV (Continued) 



CONCANAVALIN B 
NARBONIN 
NONFLUORESCENT FLAVOPROTEIN 
PH0SPHQTRIESTERA5E HOMOLOGY PROTEIN 
YEAST HYPOTHETICAL PROTEIN 



\ 
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Annex 1 



Nature strategies to evolve new proteins. Nature has may have used a combinatorial strategy for 
evolving new catalytic activities from pre-existing o/p-barrel enzymes. These strategies, 



are at 



least three: 





Enz 

w// 

Me 2+ 



r 

c 

\ 

o 



Enz-B: 



Enz-BH + 



I) A rate-limiting step in (he catalytic mechanism is retained and the substrate-binding site (the 
hydrophobic pocket and charged region) evolved by punctual mutations. For instance, in the 
enolase superfamily. 16,17 



The fate of the intermediate is determined by the structure of each active site, so that the overall 
reactions differ and may involve 1 . 1 -proton transfer (racemization): Mallelate 



racemase 
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or p-climination of water Enolase 

> 

2) In the superfamily of the metal-depend hydrolases the general mechanistic features are 

conserved (e.g. metal binding site) 7 - 8 but few changes in the charged and hydrophobic regions of 
the binding site allows the catalysis of multitude of different reactions. 



Enz Enz 




Me • OH 

/ \ 

Enz Enz 



Overall reactions: 




Urease 

Phosphodiesterase 
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3). In Class I Aldolases and Class II aldolases, TIM and Xylose isomerase, PRAI and IGPS.'.U 
the structure of the binding site may be retained and that of the active-site lid is modified by 

> 

insertions, deletions and recombination. 



IGPS: 



PRAI: 



Enz-X+ 



Enz-AH 



OH 



CO 



0. 



vyx x . 
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OH 
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H H 
H 
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Enz-B' 



Enz-A' 



OH 
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OH 
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y- R 

C II 



/ \ 



OH 



< 

H ""C 
C 



H 

-N-R 



=0 



SUBSTITUTE SHEET (RULE 26) 



WO 01/42432 



PCT/GB00/04661 



74 

CLAIMS 

1. A method of obtaining an enzyme that catalyses a desired 
reaction on a target substrate, the method comprising: 

5 selecting a parent a/p barrel enzyme that comprises a 

scaffold and an active site lid and which either 

(i) binds the target substrate, or 

(ii) binds a similar substrate and catalyses a reaction 
of the same type as said desired reaction; 

10 modifying the amino acid sequence of the N-terminal 

segment, pi-al loop, p6-a6 loop and/or C-terminal segment of 
the parent a/p barrel enzyme, and optionally altering 
additional amino acid residues within the parent a/p barrel 
enzyme, whereby one or more candidate product enzymes is 

15 obtained; 

selecting from the candidate product enzymes a product 
enzyme that comprises a scaffold and an active site lid, which 
product enzyme catalyses the desired reaction on the target 
substrate . 

20 

2. A method according to claim 1 wherein the parent enzyme 
comprises a scaffold that binds the target substrate. 

3. A method according to claim 1 or claim 2 wherein said 

25 modifying of the parent enzyme to obtain one or more candidate 
product enzymes comprises grafting to the scaffold of the 
parent enzyme an active site lid of another e^yme. 

4. A method according to any one of claims 1 jto 3 comprising 
30 modifying the parent a/p barrel enzyme by deleting an N- 

terminal segment, shortening the pi-al loop, and modifying the 
P6-a6 loop. 
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5. 



A method according to any one of claims 1 to 3 comprising 
modifying the parent a/p barrel enzyme by adding an N-terminal 
segment, lengthening the pl-ctl loop, and modifying the p6a6 



loop. 



6. A method according to any one of claims 1 to 5 comprising 
modifying an N-terminal segment, the pl-al loop, and the p6a6 
loop, and optionally altering one or more amino acid residues 
within one or more of the loops P3-a7, p7-a7 and p5-a5. 

7. A method according to any one of claims 1 to 5 comprising 
altering one or more amino acid residues between the loops p7- 
a7 and P8-a8. 

8. A method according to any one of claims 1 to 5 comprising 
altering one or more amino acid residues in one or more of the 
loops P2-a2, p4-a4 and p3-a3. 

9. A method according to any one of claims 1 to 8 comprising 
modifying the parent a/p barrel enzyme to introduce one or 
more amino acid sequence motifs or residues in accordance with 
a consensus for a/p barrel enzymes that catalyse the desired 
reaction or a reaction of the same type as the desired 
reaction . 

10. A method according to any one of claims |to 9 comprising 
random mutagenesis of residues within the parent- a/p barrel 
enzyme, and selection of a candidate enzyme on ability to bind 
said target substrate. ' 

11. A method according to any one of claims 1 to 9 comprising 
random mutagenesis of residues within the parent a/p barrel 
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enzyme, and selection of product enzyme on ability to catalyse 
the desired reaction on said target substrate. 

12. A method according to any one of claims 1 to 11 further 
comprising, following the obtaining of said product enzyme, 
providing nucleic acid encoding the product enzyme. 

13. A method according to claim 12 wherein said nucleic acid 
is provided operably linked to regulatory sequences within an 
expression vector for expression of the encoded product 
enzyme . 

14. A method according to any one of claims 1 to 11 further 
comprising, following the obtaining of said product enzyme, 
synthesizing said product enzyme by expression from encoding 
nucleic acid in a recombinant system. 

15. A method according to claim 14 further comprising 
isolating and/or purifying said product enzyme. 

16. A method according to claim 14 or claim 15 further 
comprising formulating said product enzyme into a composition 
comprising at least one additional component. 

\ 
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FIG. 4 
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